Robust ANPR System Using YOLOv8 and TrOCR with Aspect-Ratio-Based Splitting and Post-Processing

Authors: Deon Menezes, Sajal Patel, Abdul Shaikh, Nishaad Parikh, Nishaad Parikh

DOI Link: https://doi.org/10.22214/ijraset.2025.74265

Abstract

In this paper we present a modular and efficient Automatic Number Plate Recognition (ANPR) system which combines YOLOv8 - based plate detection and transformer-based OCR via TrOCR. A novel aspect-ratio-based splitting strategy helps it recognize vertically stacked number plates. The system has webcam capture and logs cleaned time-stamped output as part of a clear process to follow at some later point in time. Our approach has significant improvements to accuracy and robustness.

Introduction

To develop a high-confidence, real-world ANPR system for applications like traffic monitoring, tolling, parking, and security, overcoming challenges of traditional OCR in uncontrolled environments (e.g., variable lighting, plate formats).

1. System Overview

The pipeline integrates:

Image Input (camera or static image)
Number Plate Detection using YOLOv8 (custom-trained)
Aspect-Ratio-Based Alignment (for stacked plates)
OCR with TrOCR (Microsoft’s Transformer OCR model)
Post-processing (cleaning predictions)
Logging with timestamped results
Optional Image Display for inspection

2. Key Components

YOLOv8: Detects license plates with high accuracy using bounding boxes.
Aspect Ratio Heuristic: Handles vertically stacked plates by splitting and re-aligning them horizontally if height-to-width ratio < 2.3.
TrOCR: Uses a Vision Transformer encoder and GPT-2 decoder for OCR.
Post-processing:
- Converts text to uppercase.
- Removes non-alphanumeric characters to improve readability and correctness (e.g., "tn43-j@0158" → "TN43J0158").
Logging: Saves OCR results with timestamps and images for debugging.

3. Challenges Addressed

Two-Row Plates: Handled with aspect ratio splitting, though still error-prone for angled/skewed inputs.
Noise from Plate Impurities: Bolts, rust, etc., sometimes misclassified as characters. Mitigated via morphological cleaning and filtering.
Character Confusion: Similar-looking characters (O/0, B/8, I/1) frequently misrecognized. Addressed with filtering, normalization, and heuristic validation.

4. Results

Dataset: YOLOv8 trained on Indian plates (Roboflow); TrOCR used without fine-tuning.
Performance Metrics:
- YOLOv8 Detection Accuracy: mAP@0.5 = 94.6%
- TrOCR Character Error Rate (CER): 2.3%
- Overall Word Accuracy (full plate read): 91.8%
Impact of Post-Processing: Significantly improved OCR clarity and reliability.

Conclusion

We present a comprehensive and practicable ANPR system based upon YOLOv8 and TrOCR with added reasoning process to handle stacked plates correctly, and post-processing to generate clean outputs. The full system can accept a webcam feed in real time and has been proven to perform across a spectrum of lighting conditions and vehicle style plates. Future work will include real time multiple vehicle tracking and support for multi-language plates.

References

[1] Jocher, G., et al., “YOLOv8: Ultralytics 2023,” https://github.com/ ultralytics/ultralytics, 2023. [2] Li, M., et al., “TrOCR: Transformer-based Optical Character Recognition,” arXiv preprint arXiv:2109.10282, 2021. [3] Roboflow: Label and Train Custom Object Detection Models, https:// roboflow.com. [4] Baek, J., Kim, G., Lee, S., et al., “What is Wrong with Scene Text Recognition Model Comparisons?” ICCV, 2019. [5] Silva, S., Jung, C. R., “License Plate Detection and Recognition in Unconstrained Scenarios: A Survey,” IEEE Trans. on Intelligent Transportation Systems, vol. 18, no. 2, pp. 377–391, Feb. 2017. [6] Redmon, J., Farhadi, A., “YOLOv3: An Incremental Improvement,” arXiv preprint arXiv:1804.02767, 2018. [7] Zhang, H., et al., “Real-Time Automatic License Plate Recognition Based on Efficient Deep Learning Models,” Sensors, vol. 22, no. 3, 2022. [8] Smith, R., “An Overview of the Tesseract OCR Engine,” Proc. ICDAR, 2007. Discusses post-processing strategies for noisy OCR outputs. [9] OpenALPR Benchmark Datasets, https://github.com/openalpr/ benchmarks, accessed July 2025. [10] Liao, M., Shi, B., Bai, X., Wang, X., and Liu, W., “TextBoxes: A Fast Text Detector with a Single Deep Neural Network,” AAAI, 2017. [11] Wang, T., Wu, D. J., Coates, A., and Ng, A. Y., “End-to-End Text Recognition with Convolutional Neural Networks,” ICPR, 2012. [12] Huang, C. S., and Lee, H. J., “A Real-Time and Robust License Plate Localization Algorithm Based on Multiscale Morphological Processing,” IEEE ICIP, 2004. [13] Zherzdev, S., and Gruzdev, A., “LprNet: License Plate Recognition via Deep Neural Networks,” Journal of Open Source Software, vol. 3, no. 30, 2018. [14] Liu, Y., Chen, H., Shen, C., He, T., Jin, L., and Wang, L., “ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network,” CVPR, 2020. [15] Nayef, N., et al., “ICDAR2019 Robust Reading Challenge on ArbitraryShaped Text—RRC-ArT,” Proc. ICDAR, 2019. [16] Montazzolli, S., and Jung, C. R., “Real-Time Brazilian License Plate Detection and Recognition Using Deep Convolutional Neural Networks,” Journal of Electronic Imaging, vol. 26, no. 5, 2017. [17] Neuhold, G., Ollmann, T., Rota Bulo, S., and Kontschieder, P., “The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes,” ICCV, 2017.

Copyright

Copyright © 2025 Deon Menezes, Sajal Patel, Abdul Shaikh, Nishaad Parikh, Nishaad Parikh. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET74265

Publish Date : 2025-09-16

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here